Caching and Locality

Tom Davis
Silicon Graphics

November, 1994


Many computer science textbooks on algorithms assume that your computer's 
memory is all the same in the sense that it is no more expensive to
access any word than it is to access any other.  This makes the
analysis easy, but on modern computers, it is far from true.  Some
types of memory can be 1000 times slower to access than others, and
particularly bad programming can slow down a program's running time by
factors of tens or hundreds.  This tutorial attempts to explain why it
happens, how to avoid it, and things to watch out for in your code.

The explanations below are necessarily simplified, but it's likely that your local processor designer would be glad to tell you the gory details if you want. Fortunately, most people are able to vastly improve their code without knowing the exact details. Of course, the more you know, the better a programmer you will be.

My terminology below is a bit non-standard. I've basically referred to all levels of the memory hierarchy as caches, since in some sense, they all behave the same way, and the same principles apply when thinking about each level. Usually, only the primary and secondary caches are referred to by the name "cache".


Part 1: Hardware Reality

What is a cache?

Computers have different kinds of memory, and generally, the faster the memory is, the more expensive it is. Caching is an automatic mechanism supported by the hardware that attempts to keep data that's heavily used in memory that is very fast, and little-used data in slow memory. The general idea is that when you access a word of memory, it is copied into the cache so that if you need to access it again within a short time, it will be in the highest speed memory. Of course, there's only a small amount of the highest speed memory, so if you don't access the word for a while, it will be replaced in the cache by some other word. In addition, since memory tends to be accessed sequentially (especially machine instructions), when the computer needs to read a new word into the cache, it will often read the next few words into the cache at the same time in the hope that you will access them on successive instructions and they will already be available in the high speed memory.

The caches are shared by all the processes running on a machine, so just because you "know" that your program will always be running on a machine with, say, a 2 megabytes of secondary cache, don't write code that will behave badly with less than that. You're sharing with other processes, and the more you share with, the less effective cache space your process will have.

How does a cache work?

The primary and secondary caches are accessed with something called content-addressable memory. Since the cache may contain memory with somewhat random addresses, both the address and the data must be saved. When the processor tries to access text or data using its address, all the addresses currently in the cache are simultaneously tested against the address, and if there is a match, the corresponding data or computer instruction is returned. Notice that this is different from regular memory -- if the processor wants the data at address 10, it simply has to look in the 10th slot; in the cache, any of the cache slots' address portion may contain the address field, and hence all must be searched simultaneously. The sought-after address is basically broadcast to all the cache's address slots, and if there's a match in one of them, the corresponding data is returned.

What is the memory hierarchy?

Modern computers have a whole sequence of memories, each operating at different speeds. The sequence is called a memory hierarchy, essentially beginning with the registers, and successive "caches" are larger and slower. As your program makes access to larger and slower caches, the amount of memory brought in each time increases. Here's a quick description of the different sorts of caches:

Of course, pretty soon all your main memory will be in use, and to page in another page, the operating system will have to get rid of some other page. The one it chooses to get rid of depends on its paging algorithm, and there are heated debates on the best way to write such an algorithm, but when it does choose one, it first has to write out any changes you may have made to the page to the disk. Page sizes are different on different machines, but on ours, it is currently 4096 bytes. A typical algorithm for deciding which page to replace that's not too bad is called the RLU or "least recently used" algorithm, which pages out the memory that hasn't been accessed in the longest time.

What is a cache miss? What's the penalty for missing?

A cache miss occurs when your program can't find the word it needs in a cache (at any level) The hardware is actually a bit smarter than this, but you can think of what happens as follows: A register access is always there, so these never cause a problem (on our machines, at least). The interesting stuff occurs with main memory accesses. First, the primary cache is searched for the word. If it's there, you're done. The program reads the word, and continues on its merry way. If it's not there, search the secondary cache. If found, copy it (and a few of its neighbors) into the primary cache and to the processor and continue. If it's not in the secondary cache, try to fetch it from main memory, and while you're at it, put copies in the primary and secondary caches. Et cetera.

Mashey's 10:1 rule of thumb

John Mashey has a rough rule of thumb that seems crazy, but turns out not to be too bad. He says that each time the memory has to be accessed from a slower cache, the final speed is about ten times slower than it would have been had the data been in the next faster cache.

What happens when the cache is full?

If any cache is full, the same thing happens as when the process page faults. New data is copied into the cache, clobbering some data that's already there. If the data that's there has been modified, before it is clobbered, it will first have to be written to a slower cache. It's possible to design clever hardware that automatically tries to write modified cache entries back to slower caches if the time and bus cycles are available, but you can't be guaranteed that this will happen, and every cache miss may force the processor to write the changed data back to the slower caches.

Instruction caches and data caches

On many machines (ours included), caches for computer instructions and data are treated differently. That's because the contents of the instructions are never changed in normal operation. Obviously, the code that loads the instructions in the first place violates this idea, and so would self-modifying code, but in both cases the programmer has to do something special to modify the code's instructions (often called "text", as opposed to "data"). Thus misses to the instruction cache never require the processor to write the modified data back to the slower caches. Data caches always have to be on the lookout for this, and are consequently more expensive to implement.




Part 2: Software Considerations

Data:

Code:

Algorithms: